Increasing morphological disparity and decreasing optimality for jaw speed and strength during the radiation of jawed vertebrates

The Siluro-Devonian adaptive radiation of jawed vertebrates, which underpins almost all living vertebrate biodiversity, is characterized by the evolutionary innovation of the lower jaw. Multiple lines of evidence have suggested that the jaw evolved from a rostral gill arch, but when the jaw took on a feeding function remains unclear. We quantified the variety of form in the earliest jaws in the fossil record from which we generated a theoretical morphospace that we then tested for functional optimality. By drawing comparisons with the real jaw data and reconstructed jaw morphologies from phylogenetically inferred ancestors, our results show that the earliest jaw shapes were optimized for fast closure and stress resistance, inferring a predatory feeding function. Jaw shapes became less optimal for these functions during the later radiation of jawed vertebrates. Thus, the evolution of jaw morphology has continually explored previously unoccupied morphospace and accumulated disparity through time, laying the foundation for diverse feeding strategies and the success of jawed vertebrates.


INTRODUCTION
Almost all living vertebrates are jawed vertebrates (1). The origin and early evolution of jaws is among the most formative of events in vertebrate evolutionary history, precipitating profound changes in predator-prey relationships and the foundations of extant vertebrate biodiversity (1,2). Many biomechanically novel feeding behaviors were established early in the evolution of jawed vertebrates, including complex linkage systems and high stress mitigation for durophagy and processing armored prey (3)(4)(5), all of which have contributed to the ecological success of vertebrates (1). The tempo and mode of this evolutionary episode remains poorly characterized, but those few studies that have investigated jaw functional disparity have perceived stasis throughout much of the initial gnathostome radiation (6,7), apparently contradicting the traditional view that gnathostome diversification was predicated on the innovation and broad ecological utility of the jaw. Here, we use a new approach to investigate this apparent limit on the evolution of the jaw via characterizing the range of theoretical forms accessible to early gnathostomes. We assess the functional capability of these theoretical gnathostome jaw shapes, testing which shapes are optimal for stress resistance, speed of closure, or a trade-off between these two traits. Following this, we document the temporal distribution of empirical jaw morphologies and those of inferred ancestors within theoretical morphospace to test whether gnathostome jaws were constrained by functional optimality during their early evolution. We evaluate the empirical record of jaw evolution in early jawed vertebrates within this functional context. We characterized the mandibular morphology of 121 early gnathostomes (late Silurian to the end Devonian; ~427 to 359 million years (Ma) ago through elliptical Fourier analysis (EFA) (8) of two-dimensional (2D) lateral images. We then generated a theoretical morphospace of mandible morphologies representing the tangent space of empirical shapes (9)(10)(11). We used functional testing to establish how strength and rotational efficiency (RE) vary through theoretical shape space, as these metrics are critical to feeding function; speed of jaw closure has been implicated in capture of fast-moving prey, while jaw strength has been linked to bite ability and the procurement of harder foodstuffs (12,13). The two traits may trade-off due to the lever-like nature of the vertebrate jaw typically considered to generate fast versus forceful closure, although similar output metrics can be reached via different morphologies [e.g., (14)]. This allowed us to establish an adaptive landscape within the theoretical morphospace, as has been effective in previous studies constructing performance surfaces and adaptive landscapes (13,(15)(16)(17)(18)(19). However, our adaptive landscape is constructed differently to those studies that fit models to performance data. Instead, we use a new Pareto ranking approach (see Methods) that highlights optimal morphologies without using fitness functions weighted toward particular ecologies or groups (20,21), providing a general picture of optimality among extinct taxa, where survival data (22) are unavailable and ecological data are incomplete (see Fig. 1 for an overview of methods). Pareto concepts have been used before in the interpretation of morphospace and adaptive landscapes; however, the construction of adaptive landscapes within a Pareto framework has not been previously used (15-17, 19, 23).
Our metric of optimality refers to functional optimality, not true "fitness," which is much more complex. Using a phylomorphospace approach in which we model the mandibular shapes for the ancestors of the earliest known jawed vertebrates, we characterized the phylogenetic exploration of this adaptive landscape, demonstrating that the earliest mandibles exhibited morphologies that were optimal for strength and RE. Furthermore, far from plateauing, early jawed vertebrates explored an increasing range of jaw morphospace that tracked optimal adaptive regions early in their evolution. However, subsequent sarcopterygian evolution is characterized by a shift 1 toward less optimal regions of morphospace, perhaps reflecting a shift away from the functional drivers that characterize the initial gnathostome radiation or that functional constraints weaken over time, becoming less restrictive in the evolution of jaw form.

Theoretical morphology
We used 12 size and rotation-corrected elliptical harmonics (45 metrics in total) output from EFA to characterize 121 gnathostome jaw shapes in 2D (Figs. 1 and 2). A principal components analysis (PCA) of the empirical shapes captures 88.6% of variation by the first five PC axes and 70.9% in the first two axes. By modifying the harmonic dataset, 483 theoretical shapes were generated in an evenly spaced 23-by-21 grid across the PC1-PC2 morphospace within an area that encompassed the range of realized jaw form plus a border of an extra 20% the range of PC1 (Fig. 2). This meets the expectations of a theoretical morphospace, since its dimensions are geometric models of form and it encompasses morphological variation that extends beyond that observed in empirical data (10). It is commonly argued that theoretical morphospaces are not constructed with reference to measurement data from existing form (10) as ours is. However, we reject this qualification, since, from their inception, theoretical morphospaces have been based on measurement data from existing form. For example, the seminal theoretical morphospace analyses by Raup (24,25) were preceded and explicitly informed by his characterization of the coiling parameters of gastropods based on empirical measurement data (26). Our approach allows for a much broader application of the theoretical morphospace approach, which has largely been limited to geometrically simple biological structures [e.g., (10)].
Within this article, we refer to jaw shapes by their position in theoretical morphospace, which spans from −0.26 to 0.28 in PC1 and −0.26 to 0.24 in PC2. In general, increasing PC1 represents increasing jaw depth and decreasing jaw length, while increasing PC2 represents the shift from a more convex to a more concave dorsal surface. Extended regions show geometrically viable jaw morphologies in high PC1 coordinates, but the lower PC1 borders show regions of self-intersecting geometry. Low PC1 regions are therefore geometrically impossible regions of morphospace [cf. (9,27)]. Other extreme areas may be geometrically infeasible in nature due to poor articulation surfaces with the skull, but this is not testable with jaw morphology alone.

Functional performance of theoretical morphologies
We conducted functional analysis on the theoretical shapes. All taxa in the dataset are aquatic and encountered obstacles imposed by aquatic feeding, such as the bow wave produced by extending jaws (28). Yet, they also benefit from feeding in a dense medium where suction feeding is feasible for prey capture (14,29,30). The first functional metric that we analyzed therefore was the RE of the jaw as a proxy for speed of the jaw opening and closing due to its role in defining the time to peak gape and therefore suction feeding performance (5,29,(31)(32)(33). We defined RE as the speed of the jaw tip, given EFA results were input to a PCA to build a theoretical morphospace of evenly spaced theoretical jaw shapes. (C) Each theoretical shape is tested 1000 times with random input constraints using finite element analysis (FEA) to assess their functional performance in RE and VMS. (D) Each shape was plotted in performance space with its individual performance metrics. These shapes are then ranked using a Pareto system, with the assumption that lower VMS (higher strength) is optimal and that higher RE (higher speed and efficiency) is optimal. (E) Ancestral state reconstruction (ASR) is used with EFA data and a timed phylogeny to construct a phylomorphospace.
(F) The Pareto rank from the performance space is used to construct an adaptive landscape, and the evolution of taxa within this adaptive landscape is observed via the phylomorphospace.
one unit of angular kinetic energy, which is dependent on the length and the moment of inertia of jaw shape (5). Stress resistance was the second functional metric determined, as it is a common adaptive feature tested in feeding systems to gauge resistance to the forces generated during biting (13,18). For inferring stress resistance, we used the median von Mises stress (VMS) of a finite element model subject to jaw loading: Models were fixed at a jaw hinge and anterior bite point and loaded with muscle force. To account for uncertainty in boundary condition position and orientation for the theoretical jaw shapes, we varied the locations of the jaw hinge and bite position over 5% of the total length of the jaw for the calculation of RE (Fig. 3A). Muscle force location was also varied over 5% total jaw length, and muscle force orientation was varied 45° in either direction. All theoretical shapes were analyzed with these constraints and randomized 1000 times, resulting in a total of 430,000 finite element analysis (FEA) models and 430,000 calculations of RE. To infer the performance of jaw shape only, we standardized all theoretical jaw morphologies to the same surface area. Performance surfaces were constructed from the mean (Fig. 3) and 5th and 95th percentile performance values of each functional metric (figs. S6 to S8). The RE performance surface (Fig. 3A) shows a clear relationship between shape and speed. Geometrically viable regions at low PC1 show greater RE compared to high PC1 regions, and more intermediate PC2 values have greater RE than the extremes of PC2. The RE is well characterized by a second order polynomial surface [sum of squared estimate of errors (SSE) = 0.3619, root mean square error (RMSE) = 0.0293, coefficient of determination (R 2 ) = 0.9280, and degrees of freedom (DOF)-adjusted R 2 = 0.9272], specifically a hyperbolic paraboloid. The VMS performance surface (Fig. 3B) shows a radically different shape. Regions of low PC1 and PC2 show extremely high VMS and by inference poor performance, and the variation in VMS across the majority of the theoretical morphospace is minimal in comparison to the magnitude of the low PC1-PC2 spike in stress. This surface is not well characterized by a quadratic surface (SSE = 1.2213 × 10 7 , RMSE = 170.1191, R 2 = 0.6669, and DOF-adjusted R 2 = 0.6630). Log transforming the VMS data produces a clearer visualization of these results ( fig. S8). Given the assumption that, all else being equal, fitness increases with decreased VMS and increased RE, speed and strength are compromised within a trade-off, as longer structures with more mass distributed toward the pivot point (low PC1 values) rotate faster and experience larger stresses than shorter structures of more homogeneous thickness (high PC1 values).

Pareto optimality and the occupation of theoretical morphospace
Plotting the strength of each theoretical jaw morphology against its speed ( Fig. 4A) highlights the trade-off in our chosen performance metrics and reveals that many theoretical shapes have low stress scores and intermediate RE (many-to-one mapping of form to function) (14). Within this "performance space," it is possible to establish the Pareto front of theoretical shapes, i.e., those theoretical shapes in which neither metric can be improved without deteriorating performance of the other (34).
To establish how early jaw evolution explored this performance space, we modeled the evolution of jaw shape on a time-scaled phylogeny of early jawed vertebrates, allowing us to reconstruct mandible morphology for the ancestors of the fossil jawed vertebrates sampled, including the ancestral jawed vertebrate ( fig. S9). These ancestral jaw shapes were projected into the performance landscape along with the sampled taxa (Fig. 4B). On this basis, we find that most early gnathostomes occupy the Pareto front, corroborating our prior view of the adaptive value and trade-off of the functional metrics tested. However, some taxa plot further from the optimal boundary in regions of relatively high stress and intermediate RE. Many of the jaws occupying suboptimal regions of performance space independently migrated to this region from the Pareto front and did so by decreasing strength rather than decreasing RE (Fig. 4B). This migration may be the result of another attractor in performance space caused by an additional functional metric not tested here or optimizing stress resistance becomes much less important to these taxa than optimizing RE or both.
We developed a Pareto rank algorithm that assigns each theoretical shape a value from 0 (suboptimal) to 1 (optimal) (see Methods). Plotting the Pareto rank of each theoretical shape on the z axis above the morphospace generates the adaptive landscape of theoretical shapes (Fig. 4C, blue scale from 0 to 1). The optimality rank

of 12
of empirical taxa was extrapolated from their location on this surface (Fig. 4C, data points). The majority of early jawed vertebrates occupy a Pareto optimal region of space, with some taxa showing extension into suboptimal space. One explanation for this phenomenon may be that "suboptimal" taxa in the PC1-PC2 plane lie in higher-dimensional space that is optimal. If this were the case, then we may expect taxa further away from the PC1-PC2 plane to lie in the less optimal regions on the PC1-PC2 plane. No significant correlation between the Euclidean distance from individual taxa to the plane of theoretical morphospace and their extrapolated optimality was found ( = −0.0217 and P = 0.8127). Therefore, taxon suboptimality is unlikely to be due to shape variation that is not captured by PC1 and PC2.

Disparity and optimality
Early jawed vertebrate taxa generally occupy only optimal regions of stress and speed function, with the exception of some sarcopterygian taxa that occupy suboptimal performance regions at lower PC2 values ( Fig. 4, B and C). In particular, low optimality sarcopterygian taxa include Nesides (0.1239), Soederberghia (0.3389), Diplocercides (0.3462), Serenichthys (0.3781), and Miguashaia (0.4067). These are coelacanths and a lungfish, suggesting some phylogenetic correlation to suboptimality. The phylomorphospace (Fig. 4C) shows that many taxa have independently evolved similar morphologies, evidencing a widespread convergence in jaw shape. A multivariate K statistic (K mult ) (35) showed a weak but significant phylogenetic signal in the shape data (K mult = 0.37425, P = 0.0001). Focusing on the empirical data, we measure disparity in our dataset through two metrics (sum of variances and mean pairwise distance), which were chosen due to their robusticity to sample size (36). Mean pairwise distance shows significant steady increase through evolutionary time, while the sum of variances shows a similar trend that is not significant (Fig. 5 and dataset S1C). However, measurements from consecutive time bins show large overlap in bootstrap confidence intervals in all metrics (Fig. 5). This shows the disconnection between patterns of morphological and functional disparity (6), which could be due to the "many forms, one function" nature of morphology, or the mechanical sensitivity of the system, which has been demonstrated  to affect disparity measures (14,37). Mean taxon optimality for each time bin shows a steady decrease with time and has a significant negative relationship with mean pairwise distance ( Fig. 5 and dataset S1D).

DISCUSSION
We find that a large range of theoretical shapes exhibit optimal performance within a tradeoff for RE and jaw strength, particularly those with low curvature and a mass distribution that is weighted toward the jaw articulation (Fig. 4). This is because the jaws that harbor more mass close to their pivot have inherently lower rotational inertia and therefore higher RE while still maintaining large areas of mass for stress distribution. Pareto ranking characterizes the optimality of a set of solutions (in this case, theoretical morphologies) relative to one another when considering their individual performances (in this case, their strength and speed). This does not denote the adaptive value or fit of a jaw to a specific ecological role. We do not assess the relative importance of different functional metrics, as has been successful in studies of extant organisms where accurate ecological data are available (13,(15)(16)(17)19). Here, we interpret the optimality of each jaw as a 2D vector of weights, w, which determines the adaptive value of the two dimensions of performance (strength and speed, represented in a 2D performance space; Figs. 1 and 4, A and B). We do not need to know the precise magnitude or direction of w, as Pareto ranking allows us to identify the range of optimal morphologies given only the quadrant that w occupies. In practice, this is equivalent to our assumption that higher RE is adaptive and higher VMS is maladaptive. Despite many taxa exhibiting optimal ranking for the speed-strength trade-off, we also find that there are large areas of space unoccupied by empirical taxa that are ranked highly within our system. Taxa may not explore these areas, because they exhibit dichotomous performance-the morphologies in that space perform exceptionally in one metric but very poorly in the other.
We find that the earliest gnathostome jaws, and their inferred ancestral stages, have mandibles that are optimized for a tradeoff for speed and strength, therefore supporting a predatory function. The distribution of taxa within the boundaries of our theoretical jaw morphospace demonstrates that their jaws were optimized for this tradeoff (Fig. 4 and fig. S10). Thus, optimality was achieved very early in jaw evolution, and our time-sliced performance space reveals that much of the subsequent exploration of shape space tracks the optimality front (Fig. 5). This pattern was maintained in each major clade or grade, with Placodermi, Chondrichthyes (here including acanthodians), Actinopterygii, and Sarcopterygii all exhibiting optimal jaw morphologies early in their evolution. Disparity increased as placoderms and sarcopterygians diversified into opposing regions of morphospace. Within the placoderms, many arthrodires occupy higher PC2 regions, representing a shift to stronger and less rotationally efficient morphologies. In the extreme case, Gorgonichthys represents a diversification into suboptimal shape space, reflecting a shift toward decreased strength while maintaining RE. Sarcopterygians diversified into shape space are also characterized by increased strength but lower speed efficiency (higher PC1 coordinates). Lungfish and actinistians independently evolved suboptimal morphologies that have characteristics comparable to Gorgonichthys: decreasing strength while maintaining RE. Lower strength in lungfish jaws is unexpected and perhaps inconsistent with the durophagous features of dipnoan jaws; we hypothesize that this is due to variation in the medial aspect not captured by our analysis or the loss of data to higher PC axes. Despite exploring different extremes of morphospace, Sarcopterygii and Placodermi exhibit functional convergence. Chondrichthyans and actinopterygians remained confined to their initial range of morphologies within Pareto optimal space.
Our results are compatible with the hypothesis that the mandibles of the earliest jawed vertebrates were optimized for prey acquisition and processing (38)(39)(40). Rather than diversifying through shape space until an optimal morphology is achieved, the early evolution of jawed vertebrates is characterized by diffusion among equally optimal but morphologically disparate jaw morphologies. It is inevitable that the known fossil record misses an initial evolutionary episode of mandibular evolution, but our phylomorphospace approach, in which we infer morphologies ancestral to those sampled, diminishes the impact of this formal possibility. Deviation from morphologies optimized for prey acquisition and processing is a feature of later phylogenetic history. While the variety of mandibular functional morphology remains static (6), mandibular shape disparity increases through time (Fig. 5), made possible by the evolutionary discovery of morphologically divergent mandibles of the same optimality (Fig. 4A). Despite the repeated evolution of Pareto optimal morphologies as taxa go extinct and new taxa originate, average jaw optimality decreases with increasing time and disparity.
Decreasing optimality is caused largely by the independent evolution of taxa with jaws that are weaker but otherwise maintain RE. The most optimal jaw shapes occupy a band of morphospace across PC1 coincident with largely straight jaws. Shifting into negative or positive PC2 space results in suboptimal jaws that are convex or concave, respectively. Thus, the shift from the Pareto front may be related to the repeated evolution of jaws that are concave or convex and therefore not optimized for a strength-speed trade-off. This may be due to the introduction of an ecological factor that may have changed the importance of the adaptive criterion of stress resistance or added new attractors in performance space. Examples of ecological change might be the emergence of ram feeding, lunge feeding, and other planktivorous strategies (41,42) that are unlikely to require strong jaws but still rely on efficient jaw movement. Durophagy is another new feeding mode established within this evolutionary episode, but adaptation to hard food diets has often been associated with stronger jaw morphologies (3,18,43). Decreasing optimality within our system is more likely driven by planktivorous feeding modes than durophagy, as the former is consistent with the pattern of decreasing strength and constant RE in suboptimal jaws. However, this pattern may be an artefact of our 2D approach, since, unlike the majority of jaws sampled for our study, durophagous jaws have a complex 3D morphology. Similarly, the perceived shift away from Pareto optimal morphologies may also reflect the evolution of new musculature systems that redefine the loading conditions of the biting mandible (44)(45)(46), resulting in contrasting stress patterns and magnitudes being subject to selection. Size may be another important factor, as larger (longer) jaws in our dataset occur at the limits of variation within the empirical data ( fig. S11). However, most of the size variation is evenly spread across morphospace. Alternatively, functional constraints on jaw morphology may have weakened through time or weakened with increasing disparity, as increasingly less of optimal morphospace remains to be discovered.
In any instance, our results reveal a pattern of increasing mandibular morphological disparity with clade diversification despite stasis in the evolution of jaw functional disparity (6). This difference reflects the interrogative nature of our integrative theoretical morphology, functional optimality, and phylomorphospace approach to analyzing the evolution of form and function. It reveals that, although there may be a rapid rise to stasis in variance of phenotypic characters linked to function, this does not equate to stasis in phenotypic evolution, since our analyses demonstrate disparate morphologies that are equally Pareto optimal. The early evolution of jawed vertebrates is therefore characterized by the progressive exploration and convergence upon functionally equivalent phenotypes.
Following strong functional constraint early in gnathostome jaw evolution, increasing morphological disparity coupled with decreasing optimality suggests that the landscape of optimality roughens with the emergence of new, functionally relevant anatomical innovations and that functional constraints (strength and speed) on morphospace occupation may have relaxed over time. This provides broader insight into questions surrounding the evolution of disparity through clade history, supporting a view that morphospace may not necessarily become saturated after an early burst in disparity (47,48). Rather, disparity can continue to increase, as patterns of functional optimality are rearranged and complexified within theoretical morphospace, reflecting the adaptation of taxa to different functions and trade-offs in response to their changing environment (exploitation of new prey or of old prey in a new way). Our results provide not only a further example of continuous disparity increase through clade history but also a causal mechanism for disparity accumulation: many divergent morphologies initially converging and then diverging from functional optima. Our approach to investigating disparity has allowed us to reach beyond traditional qualitative assessments of functional constraints on morphospace occupation by empirical datasets to quantitative testing of functional limits of theoretically plausible forms. By focusing functional analysis on theoretical morphospace, we can test adaptive hypotheses on the evolution of morphology and the accumulation of disparity while avoiding prior assumptions of fitness.

Dataset
Our dataset consists of lower jaw shapes of 121 extinct gnathostome taxa ranging in age from the late Silurian to the end of the Devonian. Data were time binned, using age data sourced from the literature and the Paleobiology Database (dataset S1A), to each of the Devonian stages, while Silurian taxa were grouped into one time bin representing the late Silurian due to the low data availability. Taxa that existed in more than one bin were ranged through and counted in all intermediate time bins. The dataset was also split into four clades: Sarcopterygii (N = 57), Placodermi (N = 48), Chondrichthyes (including acanthodians) (N = 8), and Actinopterygii (N = 8) (dataset S1A). Images of lower jaws were sourced from the literature using photographs of lateral shape, reconstructions, and, where available, computed tomography scans. The lateral 2D shape of the jaw was chosen because of its prevalence in the fossil record, in figures within journal articles and books, and its previous use in the literature as a model for functional processes (5-7, 46, 49) due to many jaw shapes approximating a 2D planar shape. However, it is noted that the jaw shape in these taxa does have some 3D variation, and this will affect disparity and functional metrics (50).

Shape analysis
A further advantage to analyzing 2D jaw shape is that 2D filled polygons (such as the lateral jaw silhouette) can be characterized with zero converging error by curves with distinct functions, output from Fourier deconstructions (11,51). Specifically, EFA was the chosen method for morphometrics due to its shape characterization and reconstruction ability (8,11,52,53). Sensitivity tests of input outline data informed a decision to characterize the data with 600 outline landmarks, from which 12 EFA harmonics were generated. Six hundred were chosen, as it was the maximum number possible across all image files, and it was considerably beyond the point of convergence (see the Supplementary Materials). Size and rotation variation of each curve was eliminated (8), with the goal of characterizing the jaw shape alone. This resulted in a dataset of 45 continuous characters (4 × 12 − 3). From these data, a PCA was used to build a morphospace of maximum variation (11) within empirical shape data ( Figs. 1 and 2A).
The empirical 2D shapes are generated via a mathematical formula that plots elliptic harmonics. Changing the input parameters of this formula generates a proportional change in 2D shape. We exploited this process to generate a grid of parameterized theoretical jaw shapes (Figs. 1 and 2B). We produced a 23-by-21 grid of 483 of evenly spaced theoretical jaw shapes, which was plotted across the PC1-PC2 morphospace (Fig. 1). These 483 jaws were designed to cover the space occupied by empirical data plus an extended border range of 20% to infer the patterns at the extremes of unoccupied morphospace. Of these 483 meshes, 53 self-intersecting loops were omitted from any functional analysis, leaving 430 testable meshes. Self-intersection is an ostensibly impossible feature of any 3D structure in 2D lateral view; thus, these regions were defined as a geometrically impossible space (9,27).

Phylogeny
The phylogenetic tree used in all analyses was an informal supertree assembled from a multitude of literature sources (data S1A, fig. S9) (54)(55)(56)(57)(58)(59)(60)(61)(62)(63)(64)(65)(66)(67)(68)(69)(70)(71)(72). The tree included 99 of the 121 taxa from the analysis that could be found in phylogenies from the literature and was dated using the "equal" method of the function bin_timePaleoPhy in the R package paleotree (73). The R package geomorph was used to assess the phylogenetic signal of the data and perform maximum likelihood ancestral state reconstruction (functions physignal and gm.prcomp) (74). These functions were used due to their applicability to coordinate data, matching the EFA output data. Each axis of each elliptical harmonic was considered as a single 2D coordinate. Ancestral states were not included in the PCA to build the morphospace; instead, they were transformed into the co-ordinate space using the PC coefficients. All R analyses were performed with R v.3.6.1.

Functional analysis
Each of the 483 theoretical jaws was converted into meshes of 2500 triangular elements. A sensitivity test performed on 10 meshes incremented by 500 elements showed that, at 1000 random replications, a mesh density of 2500 elements was adequate for convergence in both functional metrics. We calculated RE, defined as the velocity of the tip of the jaw when rotating about the jaw hinge given one unit of energy. RE can be considered a proxy for jaw closure speed. The RE was thus calculated for each theoretical shape as the velocity of the bite point (v) given a rotational energy of 1 J, where v = L √ _ 2 ─ I where L is the length of the distance between bite point and the rotational axis (jaw hinge) and I is the moment of inertia at the bite point, where I can be calculated with discrete finite elements using this formula (5) where m = element mass, r = distance from the element center of mass to rotational axis, and N = the number of elements. As each theoretical jaw shape is standardized by area, their lengths still vary. This calculation uses the length of the size-standardized jaws and thus is equivalent to the length divided by the square root of jaw area. RE was first calculated taking the posterior-most node of the jaw outline with a near vertical normal as the initial jaw joint and the anterior-most node of the jaw outline with a near vertical normal as the initial bite point. Jaw joint and bite point positions were then bootstrapped over 1000 randomizations that varied the joint and bite location along 5% of the total outline length on either side of the initial placement. Mean and 95% confidence intervals were then generated for each theoretical shape. The functional landscape of mean values is reported here; see fig. S6 for 5th and 95th percentile results.
We calculated the VMS across each theoretical jaw shape mesh using a simple 2D constant strain triangle FEA algorithm in MATLAB. Each jaw was modeled as a thin plate of uniform thickness with a Young's modulus of 2 × 10 9 Pa and Poisson's ratio of 0.3. VMS has been used a proxy for strength in recent functional adaptive landscape studies (19,23,75) and has been used for a number of years as a measure of skull strength, particularly in comparative studies. Our functional landscapes for mean, maximum, and media VMS showed little difference, so we use median VMS in this study, which is less susceptible to high stress outliers generated at constraints (13,15,19,49,50). The same nodes were defined as bite and jaw joint positions as for the RE calculations. The location of muscle force is then interpolated along the perimeter of the jaw between these points, initially placed one-third of the length of the jaw from the jaw joint. Again, models were tested 1000 times with pseudorandomized input conditions, shifting the force node position by 5% of the total outline length on either side of the initial node and orienting the force direction 45° either side of the force node normal. Constraints of the jaw joint and bite position were not randomized for stress calculation due to the exponential increase in computational time that this would require and their relatively small effect on overall strain energy (76).
To explore the effect of raw jaw length on shape, we plotted raw jaw length against PC1 and PC2 scores for the empirical dataset and assessed the significance with a phylogenetic generalized least squares (PGLS). We find a significant relationship between raw jaw size and PC1 (P = 0.04129, R 2 = 0.03485) and an insignificant relationship between size and PC2 (P = 0.4846, R 2 = 0.0056). Both R 2 values were low; this appears to be due to the high shape variance in smaller jaws and the lower shape variance in larger jaws. This may suggest some developmental or functional restrictions on larger jaws. We also plotted each taxon data point as a function of jaw size on a PC1-PC2 plot. We find that some of the extremes of PC space are occupied by large jaws (e.g., the placoderms Gorgonichthys, Titanichthys, and Dunkleosteus); however, these taxa vary in shape from short deep jaws to longer thin jaws ( fig. S11).
The average median VMS value and 95% confidence intervals were then generated for each theoretical shape. As for RE, the functional landscape of mean values are reported in the main text; see figs. S7 and S8 for fifth and ninth percentile results. All functional algorithms were written in MATLAB by W.J.D. (available from Dryad, doi:10.5061/dryad.3bk3j9km8).

Pareto optimality
Building adaptive landscapes from functional metrics often uses least squares regressions or maximum likelihood to fit a first-or second-order polynomial relationship between morphology and performance and then performance and fitness (15,19,20,21). While this gives a good approximation of fitness under certain constraints and can highlight the relative importance of each functional trait measured, this approach cannot provide a single, universal fitness metric to disparate taxa with varying ecology. Instead, each group within the dataset (e.g., each ecology) has a unique adaptive landscape. Furthermore, in this study, the VMS surface is poorly characterized by the quadratic surface required for Arnold's fitness formulae (21). Here, we develop a new rank-based method of combining functional metrics into a single fitness metric, adapted from Pareto ranking algorithms (34,77). Pareto optimality has been used as a method of morphospace optimality analysis elsewhere with very promising results, although these studies operate on large-scale assumptions about the relationship between morphospace and function (78,79). We use the foundational concepts of Pareto optimality to rank morphospace location based on its functional performance.
The optimality of each theoretical morphology was ranked using a modified Goldberg Pareto ranking system (80). In many cases where the solutions to a problem (in this case, theoretical morphologies) experience a trade-off between N metrics of performance (in this case, there are two metrics: speed and strength), there exists a subset of those solutions that is Pareto optimal [Pareto optimal subset (POS)]. A solution is Pareto optimal if no other solution has better or equal performance in all metrics. We can take this concept further to generate a Pareto rank system, where the POS is assigned rank one and then removed from the sample of solutions. This allows a second POS to be found and assigned rank two. This POS is then removed from the second sample of solutions, and the process iterates until all solutions have been ranked. We develop this ranking for performance spaces with spatial occupation heterogeneity by ranking the dataset with Goldberg's ranking (optimal ranking, R O ) and then ranking it again with the optimality of each metric reversed (suboptimal ranking, R S ). The rank of the solution is then calculated via this equation This results in a linear rank from 0 to 1, with 1 denoting Pareto optimal (not dominated by any solution) and 0 denoting Pareto suboptimal (not dominant over any solution). Other Pareto ranking systems have been shown to be more effective in evolutionary algorithms (81,82); however, these methods are biased by the relative scales of functional metrics and the density of occupation of performance space. We opt not to use these methods to eliminate the requirement for scaling functional metrics and the bias caused by heterogeneous occupation of performance space, as equidistant theoretical forms in morphospace converge and diverge in function (Fig. 4A).

Disparity
Both disparity metrics were measured on EFA harmonic data and bootstrapped 10,000 times. The mean optimality of each time bin was calculated by bootstrapping taxon samples 10,000 times, extrapolating optimality from the surface at individual taxon PC scores, of which the mean was calculated. Trends in each signal were tested using a Spearman rank correlation test. The disparity (sum of variances and mean pairwise distance) against optimality relationships was tested with a standard Pearson's linear correlation test.

SUPPLEMENTARY MATERIALS
Supplementary material for this article is available at https://science.org/doi/10.1126/ sciadv.abl3644 View/request a protocol for this paper from Bio-protocol.